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Abstract 

We consider the transmission of a Gaussian vector source over a multi-dimensional Gaussian 
channel where a random or a fixed subset of the channel outputs are erased. We consider the setup 
where the only encoding operation allowed is a linear unitary transformation on the source. For 
such a setup, we consider the minimum mean-square error (MMSE) as the performance criterion 
and investigate the MMSE performance both in average and in terms of guarantees that hold with 
high probability as a function of system parameters. Necessary conditions for optimal unitary 
encoders are established, and explicit solutions for a class of settings are presented. Although there 
are observations (including evidence provided by the compressed sensing community) that may 
suggest the result that the discrete Fourier transform (DFT) matrix may be indeed an optimum 
unitary matrix for any eigenvalue distribution, we provide a counterexample. Finally, we consider 
equidistant sampling of circularly wide sense stationary (c.w.s.s.) signals, and present an upper 
bound that summarizes the effect of the sampling rate and the eigenvalue distribution. 

These findings may be useful in understanding the geometric dependence of signal uncertainty 
in a stochastic process. In particular, unlike information theoretic measures such as entropy, we 
wish to highlight the basis dependence of uncertainty in a signal with another perspective. The 
unitary encoding space restriction allows us to extract the most and least favorable signal bases for 
estimation. 

Index Terms 

random field estimation, compressive sensing, discrete Fourier Transform (DFT) 

1 Introduction 

In this paper, we consider the transmission of a Gaussian vector source over a multi-dimensional 
Gaussian channel where a random or a fixed subset of the channel outputs are erased. For such a model, 
we consider the setup where the only encoding operation allowed is a linear unitary transformation on 
the source. 

In the following, we make the system model precise and introduce the four problems which will be 
considered in the article. 
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1.1 Source and Measurement Models and Problem Definitions 



In this section, we will formulate a family of estimation problems to investigate the relationship between 
the MMSE and various measurement strategies. 

The problems we will formulate in the following will help us explore the relationship between the 
MMSE and the spread of the uncertainty of the signal in the measurement domain. We note that the 
concepts that are traditionally used in the information theory literature as measures of dependency 
or uncertainty in signals (such as degree of freedom, or entropy) are mostly defined independent of 
the coordinate system in which the signal is to be measured. For example, the concept of entropy 
for discrete time signals allows applying arbitrary invertible transformations and processing. As an 
example one may consider the Gaussian case: the entropy solely depends on the eigenvalue spectrum 
of the covariance matrix, hence making the concept blind to the coordinate system in which the signal 
lies in. 

Here we would like to explore basis dependency of uncertainty in a signal in estimation framework. 
With this motivation, we consider the following noisy measurement system 

y = Hx + n, (1) 

where x € is the unknown input proper complex Gaussian random vector, n G C M is the proper 
complex Gaussian vector denoting the measurement noise, and y € C M is the measurement vector. H 
is the M x N measurement matrix. 

We assume that x and n are statistically independent zero-mean random vectors with covariance 
matrices K x = E[xx^], and K n = E[nw)], respectively. We assume that the components of n are 
independent and identically distributed (i.i.d.) with E\nirii]] = o 2 ^ > 0, hence K n = cr^N >~ 0, where 
In is the N x N identity matrix. Let K x = UA X W y be the singular value decomposition of K x , 
where U is a N x N unitary matrix, and A x = diag(Ai, . . . , Ajv)- Here f denotes complex conjugate 
transpose. When needed, we emphasize the random variables the expectations are taken with respect 
to; we denote the expectation with respect to the random measurement matrix by Eh[.], and the 
expectation with respect to random signals involved (including x and n) by i?s[.]- 

In all of the problems we assume that the receiver has access to channel realization information. 

In the following, we present four problems that will be considered in this article. 

PROBLEM PI (Best Unitary Encoder For Random Channels): Let U N be the set of N x N unitary 
matrices: {U £ : U'U = In}- We consider the following minimization problem 

inf E H ,sl\\x-E[x\y}\\ 2 ], (2) 

UeU N 

where the expectation with respect to H is over admissible random measurement strategies: random 
scalar Gaussian channel (only one of the components is measured each time) or Gaussian erasure 
channel (each component of the unknown vector is erased independently and with equal probability). 

PROBLEM P2 (Error Bounds For Random Sampling/Support at a Fixed Measurement Domain: 
Are there any nontrivial lower bounds (i.e. bounds close to 1) on 

P(E s [\\x - E[x\y}\\ 2 ] < f P2 (A x ,U,a 2 n )) (3) 

for some function fp2, where fp2 denotes a sufficiently small error level given tr(fCj;), and a 2 . In 
particular, when there is no noise, we will be investigating the probability that the error is zero. 

PROBLEM P3 (Error Bounds For Random Projections): Let x S and y G M . Are there any 
nontrivial lower bounds (i.e. bounds close to 1) on 

P(E s [\\x-E[x\y]\\ 2 ]<f P3 (A x ,U,a 2 n )) (4) 
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for some function fp3 under the scenario of sampling with random projections (entries of H are i.i.d. 
Gaussian) with fixed eigenvalue distribution? How does the A x and H affect the performance? Here 
/p3 denotes a sufficiently small error level given tr (K x ) and u\. 

We note that in the context of this problem it is not meaningful to seek for the best orthonormal 
U (i.e. U € ^ NxN : WU = Tjv) encoder. This is because the entries of H are i.i.d. Gaussian, and 
such a random matrix H is left and right 'rotationally invariant': For any orthonormal matrix U, the 
random matrices UH, HU and H have the same distribution. See [Lemma 5, [I]]. 

PROBLEM P4 (Estimation Error of Equidistant Sampling of Circularly Wide Sense Stationary 
Signals): What is the MMSE error of equidistant sampling for a c.w.s.s. signal? What is its relation- 
ship with eigenvalue distribution and rate of sampling? 

We note that the dependence of signal uncertainty in the signal basis has been considered in different 
contexts in the information theory literature. The approach of applying coordinate transformations 
to orthogonalize signal components takes place in many signal reconstruction and information theory 
problems. For example the rate-distortion function for a Gaussian random vector is obtained by 
applying an uncorrelating transform to the source, or approaches such as the Karhunen-Loeve expansion 
are used extensively. On the other hand, the compressive sensing community heavily makes use of the 
notion of coherence of bases, see for example [21 El 0]. The coherence of two bases, say the intrinsic 
signal domain ip, and the orthogonal measurement system <j> is measured with [i = maxjj \v,ij\, U = <fnp 
providing a measure of how concentrated the columns of U are. When [i is small, one says the 
mutual coherence is small. As the coherence gets smaller, fewer samples are required to provide good 
performance guarantees. 

The total uncertainty in the signal as quantified by information theoretic measures such as entropy 
(or eigenvalues) and the spread of this uncertainty (basis) reflect different aspects of the dependence in 
a signal. The estimation problems we will consider may be seen as an investigation of the relationship 
between the MMSE and these two measures. 

1.2 Literature Review 

In the following, we provide a brief overview of the related literature. An important model in the article 
is the Gaussian erasure channel, where each component of the unknown vector is erased independently 
and with equal probability, and the transmitted components are observed through Gaussian noise. This 
type of model may be used to formulate various types of transmission with low reliability scenarios, for 
example Gaussian channel with impulsive noise [H [6] . This measurement model is also related to the 
measurement model considered in the compressive sensing framework, where the measurement scenario 
where each component is erased independently and with equal probability is of central importance 
[TIE]- Our work also contributes to the understanding of the MMSE performance of such measurement 
schemes under noise. 

The problem of optimization of precoders or input covariance matrices is formulated in literature 
under different performance criteria: When the channel is not random, [9] considers a related trace 
minimization problem, and [10] a determinant maximization problem, which correspond to optimization 
of the MMSE and mutual information performance respectively in our formulation. [11], [12] formulates 
the problem with the criterion of mutual information, whereas |13] focuses on the MMSE, and |14j on 
determinant of the mean-square error matrix. [151 f!6] presents a general framework based on Schur- 
convexity. In these works the channel is known at the transmitter, hence it is possible to shape the input 
according to the channel. When the channel is a Rayleigh or Rician fading channel, j!7] investigates 
the best linear encoding problem without restricting the encoder to be unitary. [T] focuses on the 
problem of maximizing the mutual information for a Rayleigh fading channel. [5], [6] consider the 
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erasure channel as in our setting, but with the aim of maximizing the ergodic capacity. 

In Problems P2 and P3, we investigate how the results in random matrix theory mostly presented in 
compressive sampling framework can be used to find bounds on the MMSE associated with the described 
measurement scenarios. We note that there are studies that consider the MMSE in compressive sensing 
framework such as I19|. which focus on the scenario where receiver does not know the location of 
the signal support. In our case we assume that the receiver has full knowledge of the signal covariance 
matrix. 

1.3 Preliminaries and Notation 

In the following, we present a few definitions and notations that will be used throughout the article. 
Let tr (K x ) = P. Let D(8) be the smallest number satisfying Yl^Li ^» — where 5 £ (0, 1]. Hence 
for 5 close to one, D(5) can be considered as an effective rank of the covariance matrix and also the 
effective number of "degrees of freedom" (DOF) of the signal family. For 5 close to one, we drop the 
dependence on 6 and use the term effective DOF to represent D{8). A closely related concept is the 
(effective) bandwidth. We use the term "bandwidth" for the DOF of a signal whose canonical domain 
is the Fourier domain, i.e. whose unitary transform is given by the discrete Fourier Transform (DFT) 
matrix. 

Let y/—l = j. The entries of an N x N DFT matrix are given by utk = 77]v e ' J "^* fc > where < t , k < 
N — 1. We note that the DFT matrix is the diagonalizing unitary transform for all circulant matrices 
|20| . In general, a circulant matrix is determined by its first row and defined by the relationship 
Ctk = Co m od JV (fc-t) 5 where rows and columns are indexed by t and k, < t ,k < N — 1, respectively. 

The transpose, complex conjugate and complex conjugate transpose of a matrix A is denoted 
by A T , A* and A^ , respectively. The eigenvalues of a matrix A are denoted in decreasing order as 
\ 1 (A)>\ 2 (A),...,>\ N (A). 

Here is a brief summary of the rest of the paper: In Section [21 we consider random channels and 
formulate the problem of finding the most favorable unitary transform under average performance. We 
investigate the convexity properties of this optimization problem, and obtain conditions of optimality 
through variational equalities. We identify special cases where the discrete Fourier Transform (DFT)- 
like unitary transforms turn out to be the best coordinate transforms (possibly along with other unitary 
transforms). Although there are many observations (including evidence provided by the compressed 
sensing community) that may suggest the result that the DFT matrix may be indeed an optimum 
unitary matrix for any eigenvalue distribution, we provide a counterexample. In Section [3j we illustrate 
how some recent results in matrix theory mostly presented in the compressive sampling framework can 
be used to find performance guarantees for the MMSE estimation that hold with high probability. In 
Section [H we illustrate how the spread of the eigenvalue distribution and the measurement scheme 
contribute to obtain performance guarantees that hold with high probability for the case of sampling 
matrix with i.i.d. Gaussian entries. In Sectional we consider equidistant sampling of a circularly wide 
sense stationary signal. We give the explicit expression for the MMSE, and show that two times the 
total power outside a properly chosen set of indices (a set of indices which do not overlap when shifted 
by an amount determined by the sampling rate) provides an upper bound for the MMSE. We conclude 
in Section [U 



4 



2 Problem PI: Average Performance of Random Scalar Gaussian 
Channel and Gaussian Erasure Channel 



In this section, we consider two closely related random channel structures, and focus on the aver- 
age MMSE performance. We assume that the receiver knows the channel information, whereas the 
transmitter only knows the channel probability distribution. 

We consider the following measurement strategies: a) (Random Scalar Gaussian Channel:) H = ef , 
i = 1, . . . , N with probability where E M> N is the i th unit vector. We denote this sampling strategy 
with S s . b) (Gaussian Erasure Channel) H = diag(5i), where 5i are i.i.d. Bernoulli random variables 
with probability of success p £ [0, 1]. We denote this sampling strategy with S& ■ 

We are interested in the following problem: 

PROBLEM PI (Best Unitary Encoder For Random Channels): Let K x denote the covariance 
matrix of x. Let K x = UA X U^ be the singular value decomposition of K x , where U is N x N unitary 
matrix, and A x = diag(Ai, . . . , Ajv). We fix the eigenvalue distribution with A x = diag(Aj) y 0, where 
£\ Ai = P < oo. Let U N be the set of N x N unitary matrices: {U £ C N : U^U = I}. 

We consider the following minimization problem 

inf E H , s [\\x-E[x\y]\\ 2 ], (5) 

u&s N 

where the expectation with respect to H is over admissible measurement strategies S s or Sb- Hence 
we want to determine the best unitary encoder for the random scalar Gaussian channel or Gaussian 
erasure channel. 



We note that [5] and [6] consider the erasure channel model (Sb in our notation) with the aim 
of maximizing the ergodic capacity. Their formulations let the transmitter also shape the eigenvalue 
distribution of the source, whereas ours does not. 

We note that our problem formulation is equivalent to following unitary encoding problem 
inf;y gU jv Eh^s[\\w — P My] 1 1 2 ]) where K w = A x , y = HUw + n. We also note that by solving the 
Problem PI for the measurement scheme in (TTJ, one also obtains the solution for the generalized the 
set-up y = HVx + n, where V is any unitary matrix: Let U denote an optimal unitary matrix for the 
scheme in ([T]). Then V'U a € is an optimal unitary matrix for the generalized set-up. 

2.1 First Order Conditions for Optimality 

Under a given measurement matrix H, by standard arguments the MMSE estimate is given by i?[a;|y] = 
x = K xy K y ~ 1 y, where K xy = E[xy^] = K X H^ , and K y = E[yy'] = HK X H^ + K n . We note that since 
K n >- 0, we have K y y 0, and hence K~ l exists. The associated MMSE can be expressed as [2JJ Ch2] 

P s [\\x-P[x\y}\\ 2 } = tr(K x — K xy K~ l K xy ) (6) 
= tr(K x - K x H\HK x rf + K n )~ 1 HK x ) (7) 
= \,r(UA x tf - UA X U ] H ] (HUA X U ] H ] + K^HUA^) (8) 

Let P = {i : Xi > 0}, and let Ub denote the N x \B\ matrix formed by taking the columns of U indexed 
by P. Similarly, let A X) b denote the \B\ x \B\ matrix by taking the columns and rows of A x indexed 
by B in the respective order. We note that U b U~b = I\bU whereas the equality U~bU b = In is not true 
unless \B\ = N. Also note that A Xj b is always invertible. The singular value decomposition of K x can 
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be written as K x = UA X U^ = U B A X B U B . Hence the error may be rewritten as 

E s [\\x-E[x\y]\\ 2 } = tr(U B A XjB U B -U B A XjB U^(HU B A x>B U^ + K^HUbA^bUI) (9) 
= tr(A x>B -A^bUsH^HUbA^bU^ + K^HUbA^b) (10) 

= tT((A-} B + -^U B tfHU B )- 1 ) (11) 

where (|10p follows from the identity tr(U B MU B ) = tr(MU B V ~ B ) = tr(M) with an arbitrary matrix M 
with consistent dimensions. Here (jlip follows from the fact that A XjB and K n are nonsingular and the 
Sheerman-Morrison- Woodbury identity, which has the following form for our case (see for example |22j 
and the references therein) 

K x -K 1 A\AK l A^ + K 2 y 1 AK l = {K" 1 + A^K^A)' 1 , (12) 

where K\ and K 2 are nonsingular. 

Let the possible sampling schemes be indexed by the variable k, where 1 < k < N for S s , and 
1 < k < 2 N for Sb- Let Hk be the corresponding sampling matrix. Let pk be the probability of the k th 
sampling scheme. 

We can express the objective function as 

E HiS [\\x- E[x\y}\\ 2 } = EH^iiA-^ + ^U^HUB)- 1 )} (13) 

= Y.P^^ + ^bHIWb)- 1 ) (14) 

k n 

We note that the objective function is a continuous function of Ub- We also note that the feasible 
set defined by {U B G C Nx W : U B U B = I\ B \} 

is a closed and bounded subset of C n , hence compact. 
Hence the minimum is attained since we are minimizing a continuous function over a compact set (but 
the optimum Ub is not necessarily unique) . 

We note that in general, the feasible region is not a convex set. To see this, let U\,U 2 G U N 
and 9 G [0,1]. In general 6U1 + (1 - 9)U 2 $ U N . For instance let N = 1, U x = 1, U 2 = -1, 
QU\ + (1 — 9)U2 = 29—1 ^ U , G [0, 1]. Even if the unitary matrix constraint is relaxed, we 

observe that the objective function is in general neither a convex or a concave function of the matrix 
Ub- To see this, one can check the second derivative to see if V^ s /(J7b) ^ or V^ s /(C/b) ^ 0, where 
f(U B ) = EfcPfe *r ((A~£ + ^U^HlHkUB)' 1 ) . For example, let N = 1, U G R, o* = 1, A > 0, and 
p > for S b . Then f(U) = YlkPk \-i +u l H i HkU can be wri tten as f(U) = (1 - q)X + Q ^r^jfjj , where 
q G (0, 1] is the probability that the one possible measurement is done, and 1 — q is the probability it 
is not done. Hence q = 1 for S s , and q = p for Sb- Hence V\jf{U) = q2 Tj^rffjrp , whose sign changes 
depending on A, and U. Hence neither Vfjf{U) h nor V^/(C7) ^ holds for all U G R. 

In general, the objective function depends only on U B , not U. If U B satifying U B U B = I\ B \, 
with \B\ < N is an optimal solution, then unitary matrices satisfying UfiU can be formed by adding 
column(s) toU B without changing the value of the objective function. Hence any such unitary matrix U 
will also be an optimal solution. Therefore it is sufficient to consider the constraint {U B : U B U B = 
instead of the condition {U : U^U = In}, while optimizing the objective function. We also note that 
if U B is an optimal solution, exp(j9)U B is also an optimal solution, where < 9 < 2ir. 

Let Ui be the i th column of U B . We can write the unitary matrix constraint as follows: 

t fl> ifi = fc, / 1K x 

u * Uk = \o, iu^k. (15) 



6 



with i = 1, . . . , \B\, k = 1, . . . , |JB|. Since u\u k = 0, iff u k Ui 



0, it is sufficient to consider k < i. Hence 



this constraint may be rewritten as 

eJ(U B U B -I\ B \)e k = 0, 



!,...,_£?, k — 1, . . . , i, 



(16) 



where C{ G M) B \ is the i*' 1 unit vector. 

We now consider the first order conditions for optimality. We note that we are optimizing a 
real valued function of a complex valued matrix Ub £ C JVx ' B L Let Ub,r = & W Nx \ B \, and 

Ub i = £ R^*'" 8 ' denote the real and imaginary parts of the complex matrix Ub, so that 

Ub = Ub,r + jUb,i- One may address this optimization problem by considering the objective function 
as a mapping from these two real components Ub,r and Ub,i instead of the complex valued Ub- In 
the following development, we consider this real framework along with the complex framework. 
U b ,r 



Let U B 



U b ,i 



G ]]j27Vx|B|_ ug £ rg |. congidgj- the set of constraint gradients, and investigate 



conditions for constraint qualification. 



Lemma 2.1 The constraints can be expressed as 

eJ(U^ R U B ,R + UljU B)I )e k 
eJ(Ul R U B ,i - UljU B ,R)e k 



efl\B\ek, (i,k)e-y 
0, (i,fc)€7 



(17) 
(18) 



where j = {(i, k)\i = 1, . . . ,\B\, k = 1, . . . , i}, and 7 = {(i, k)\i = 1, . . . , |J3|, k = 1, . . . , i — 1}. T/ie se£ 
0/ constraint gradients with respect to Ub is given by 



U B ,R( e i e k + e fe e ?) 
UB,i{eieJ + e fc e^) 



|(i,fc)e 7 IJ 



U B ,R( e i e k ~ e k e i 



\(i,k) G 7 



(19) 



T/ie elements of this set are linearly independent for any matrix Ub satisying U b Ub = Is- 
Proof: Proof is given in Section 17.11 of the Appendix. 

Since the constraint gradients are linearly independent for any matrix Ub satisying U b Ub = Ib, the 
linear independence constraint qualification (LICQ) holds for any feasible Ub |231 Defn.12.4]. Therefore, 
the first order condition L(Ub,v,v) = together with the condition U b Ub = Ib is necessary for 
optimality |23} Thm 12.1], where L(Ub, v,v) is the Lagrangian for some Lagrangian multiplier vectors 
and v. We use the notation L instead of L to emphasize the function is seen as a mapping from Ub 
instead of Ub- 

We note that the unitary matrix constraint in (|16p can be also expressed as 



eJ{U B U B -I\B\)e k 
el{U B U B -I\ B \)e k 



0, (i, k) e 7 

0, ke{i,...,B} 

T 1 



(20) 
(21) 



We note that in general, ej (U B Us)e k = u\u k G C , for % ^ k and eJ(U B UB)e k = u\u k G R. Hence 
(|20|) and (|2"T|) expresses the complex and real valued constraints, respectively. 

Now we can express the Lagrangian as follows [please see Section 17.21 of the Appendix for a discus- 



7 



sionj 

L(U B ,v,v) = ^p fc tr((A- 1 B + ^4^ t if fe [/ B )- 1 ) (22) 
k ° n 

+ Vi,keT(tf B U B -I ]B] )e k + ^2 <k4iU^U B -I {B{ )e k (23) 

(i,fe)S7 (i,fe)S7 

+ X;^(4l^-/|B|)e fc (24) 
fc=i 

where Vi k G C, (i, k) £ -y and 6 1, 6 {1, . . . , iV} are Lagrange multipliers. 

Let us define L(U B ,u,v) = L(U B ,v,v), the Lagrangian seen as a mapping from U B , instead of 
U B . Now we consider finding the stationary points for the Lagrangian, i.e. the first order condition 
Vjj g L(UB, v, v) = 0. We note that this condition is equivalent to Vu B L(U B , u,v) =0 [Ml 122] • We can 
express this last condition explicitly as 

1 lBl 
E^( A -i + ^ U B H k H kU B r 2 U B HlH k = ^e k eJU B + ]T ^ejuj, + ^v k e k e T k Ul 

k n (i,fc)67 (i,k)ej k=l 

where we absorbed any constants into Lagrange multipliers. In derivation of these expressions, we have 
used the chain rule, the rules for differentials of products, and the identity dtr(X _1 ) = — tr(X~ 2 dX), 
see for example [25]. In particular, 

d(ti(e T k UlU B ei)) = d(tv(e[U B U B e k )) (25) 

= tv(eJU B dU B e k + efd(U B )U B e k ) (26) 

= tv(e k eJU B dU B + (dU B ) T U B e k e[) (27) 

= tv(e k eJU B dU B + e ie lU^dU B ). (28) 

ditvih- 1 + ^U B H\H k U B y 1 ) = - tr((A^ + ^U B HiH k U B r 2 d(U B HlH k U B )) (29) 

= - tr((A^ + ^U B HlH k U B )-\U B HlH k dU B (30) 
+d(U B )HlH k U B )). 



Remark 2.1 For random scalar Gaussian channel, we can analytically show that these conditions 
are satisfied by the DFT matrix and the identity matrix. It is not surprising that both the DFT matrix 
and the identity matrix satisfy these equations, since this optimality condition is the same for both 
minimizing and maximizing the objective function. We show that the DFT matrix is indeed one of 
the possibly many optimizers for the case where the values of the nonzero eigenvalues are equal in 
Lemma \2.3[ The minimizing property of the identity matrix in the noiseless case is investigated in 
Lemma \2.4\ 

For Gaussian erasure channel, we show that the observations presented in compressive sensing 
literature implies that the MMSE is small with high probability for the DFT matrix (see Section^. 
Although these observations and the other special cases presented in Section \2.2\ may suggest the result 
that the DFT matrix may be an optimum solution for the general case, we show that this is not the case 
by presenting a counterexample where another unitary matrix not satisfying \uij\ 2 = 1/N outperforms 
the DFT [Lemma W. 
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2.2 Special Cases 



In this section, we consider some related special cases. For random scalar Gaussian channel, we will 
show that when the nonzero eigenvalues are equal any covariance matrix (with the given eigenvalues) 
having a constant diagonal is an optimum solution [Lemma 12.3] . This includes Toeplitz covariance 
matrices or covariance matrices with any unitary transform satisfying \uij\ 2 = 1/N. We note that the 
DFT matrix satisfies \uij\ 2 = 1/N condition, and always produces circulant covariance matrices. We 
will also show that for both channel structures, for the noiseless case (under some conditions) regardless 
of the entropy or degree of freedom of a signal, the worst coordinate transformation is the same, and 
given by the identity matrix [Lemma 12. 4j . 

For Gaussian erasure channel, we will show that when only one of the eigenvalues is nonzero (i.e. 
rank of the covariance matrix is one), any unitary transform satisfying \uij\ 2 = 1/N is an optimizer 
[Lemma 12.5] . We will also show that under the relaxed condition ti{K~ l ) = R, the best covariance 
matrix is circulant, hence the best unitary transform is the DFT matrix [Lemma 12. 6j . Furthermore in 
the next section, we will show that the observations presented in compressive sensing literature implies 
that the MMSE is small with high probability when \uij\ 2 = 1/N. Although all these observations may 
suggest the result that the DFT matrix may be an optimum solution in the general case, we will show 
that this is not the case by presenting a counterexample where another unitary matrix not satisfying 
\uij\ 2 = 1/N outperforms the DFT matrix [Lemma 12. 7j . 

Before moving on, we note the following relationship between the eigenvalue distribution and the 
MMSE. Let H G y^MxN ^ Q a gi ven sam pling matrix which formed by taking 1 < M < N rows from 
the identity matrix. Assume that A x y 0. Let the eigenvalues of a matrix A be denoted in decreasing 
order as Xi(A) > X 2 (A), . . . , > X N (A). The MMSE can be expressed as CD]) 



E[\\x-E[x\y]\\ 2 ] = tr ((A- 1 + ^H^HUr 1 ) (31) 

N 1 

5 A,(A^ + ^m&HU) (32) 

N M 

= V - + V - (33) 

N 1 M l 

+ U A^A? + ^WWHU) ' (34) 

N 1 M 1 

^ x^ m (a x ) + ^ ; * +y (35) 

i=M+l 1 MK XJ i=l \ n -i+i{A x ) ^ ^ 
N N 

= ^N-i+M+l(^x) + Yl 1 ,1 ' ^ 36 ) 

i=M+l i=N-M+i A* (A*) + of 

N N 

= Yl A *( A *)+ £ i.i . ( 37 ) 

i=M+l i=N-M+l \{^x) + 0% 

where we have used case (b) of the following lemma in (|34p . and the fact that A^A" 1 + —^U^ HU) < 
A^A" 1 ) + ^(WrfHU) = A^A" 1 ) + 4, in (J35}. 



> 



> 



Lemma 2.2 [4-3.6, \26fJ Let A\,A 2 G <C NxN be Hermitian matrices where rank of A 2 is at most M. 
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Then the following holds: (a) \i+m(Ai) < \(A\ + A2), i = 1, • • • ,N — M and (b) Aj+m(^4i + A2) < 
Aj(Ai), i = l,...,JV-M. 

This lower bound is consistent with our intuition: If the eigenvalues are well-spread, that is D(5) 
is large in comparison to N for 5 close to 1, the error cannot be made small without large number of 
measurements. 

The first term in (|37p may be obtained by the following intuitively appealing alternative argument: 
The energy compaction property of Karhunen-Loeve expansion guarantees that the best representation 
of this signal with M variables in mean-square error sense is obtained by first decorrelating the signal 
with W and then using the random variables that correspond to the highest M eigenvalue. The 
mean-square error of such a representation is given by the sum of the remaining eigenvalues, i.e. 
12i=M+l ^i(Ax)- Here we make measurements before decorrelating the signal, and each component is 
measured with noise. Hence the error of our measurement scheme is lower bounded by the error of the 
optimum scheme, which is exactly the first term in (|37p . The second term is the MMSE associated 
with the measurement scheme where M independent variables with variances given by the M smallest 
eigenvalues of are observed through i.i.d noise. 



Lemma 2.3 Let tv{K x ) = P. Assume that the nonzero eigenvalues are equal, i.e. A x ^b = fgr-^B- 
Let K n = o\l. Then the minimum average error for random scalar Gaussian channel (H = ef , 
i = 1, . . . , n with probability -k) is 

p -W\ + WTlMT (38) 

which is achieved by covariance matrices with constant diagonal. Ln particular, covariance matrices 
whose unitary transform is the DFT matrix satisfy this. 

Proof: Note that if none of the eigenvalues are zero, K x = I regardless of the unitary transform, 
hence the objective function value does not depend on it.) The objective function may be expressed 
as 031) 



N 1 \B\ 1 

E HtS [\\x-E[x\y]\\ 2 ] = ^-tr(L±L B + -2tf B HlH k U B )- 1 (39) 



k=l 

N 



P 



E^|- 1 + ( 1 + ^H^bUbHI)- 1 ) (40) 



1 1 fc=i 

TV 

p , Api, pi 



= ^i-^E^ + ^wkr 1 , (4i) 

where in (I40p we have used Lemma 2 of [T7]. We now consider the minimization of the following 
function 

N P 1 N 1 

fc=l k=l 1 +\B\^ Zk 

N 

where {UBU B ) k k = ^(K x )kk = ^r z k with z k = {K x ) kk . Here z k > and Y^k z k = P, since tr (K x ) = 
P. We note that the goal is the minimization of a convex function over a convex region. Since the 



JV 1 

£l7^ (43> 



10 



objective and constraint functions are differentiate and Slater's condition is satisfied, we consider the 
Karush-Kuhn- Tucker (KKT) conditions which are necessary and sufficient for optimality |27j : 

N ^ N N 

v 2 £ u i + KE ~ E = ( 44 ) 

fc=l 1 + ~^l zk k=l k=l 

where p,, v are Lagrange multipliers with z/j > 0, and z/j£j = 0, for i = 1, . . . ,N\. Solving for the KKT 
conditions and investigating the set of active constraints for the best objective function value reveals 
that best Zi is given by Zi = P/N . We observe that this condition is equivalent to require that the 
covariance matrix has constant diagonal. This condition can be always satisfied; for example with a 
Toeplitz covariance matrix or with any unitary transform satisfying \uij\ 2 = 1/N. We note that the 
DFT matrix satisfies \uij\ 2 = 1/N condition, and always produces circulant covariance matrices. 



Lemma 2.4 We now consider the random scalar channel without noise, and consider the following 
maximization problem which searches for the worst coordinate system for a signal to lie in: Let x € 
be a zero-mean proper Gaussian random vector. Let A x = diag(Aj), with tr (A x ) = P be given. 

N 

sup E[J2[(x t -E[x t \y]) 2 ]], (45) 

where 

y = Xi with probability — , i = 1, . . . , N (46) 
K x = UA x tf. (47) 

The solution to this problem is as follows: The maximum value of the objective function is ^^-P. 
U = I achieves this maximum value. 



Remark 2.2 We emphasize that this result does not depend on the eigenvalue spectrum A. 



Remark 2.3 We note that when some of the eigenvalues of the covariance matrix are identically zero, 
the eigenvectors corresponding to the zero eigenvalues can be chosen freely ( of course as long as the 
resulting transform U is unitary). 

Proof: The objective function may be written as 



N , TV N 



E[Y^[\\x t - E[x t \y]\\ 2 ]] = l^^£[||x t -2^M|| 2 ]] (48) 

i=i t=i 

N N 

^ EE* 1 - AX m 



N 

t=i i=i t=i 

N N 



i=l t=l 



where pn = -r^rr, — nSr^ 1121M/2 ^ s correlation coefficient between x t and Xi, assuming a 2 = 
E[\\xt\\ 2 ] > 0, a 2 . > 0. (Otherwise one may set p^t = 1 if i = £, and p^t = if i ^ j.) Now we 
observe that a 2 > 0, and < \pi,t\ 2 < 1- Hence the maximum value of this function is given by 
Pi,t = 0) yt,i s.t. t 7^ i. We observe that any diagonal unitary matrix U = di&g(uu), \uu\ = 1 (and 



11 



also any U = UH, where II is a permutation matrix) achieves this maximum value. In particular, the 
identity transform U = In is an optimal solution. 

We note that a similar result hold for Bernoulli sampling scheme: Let y = Hx. sup^g^jv Eh^\\\ x ~ 
.E[:r|y]| | 2 ], where the expectation with respect to H is over admissible measurement strategies Sb is 
(1 — p) tr (K x ), which is achieved by any [711, U = diag(ujj), \ua\ = 1, II is a permutation matrix. 



Lemma 2.5 Suppose \B\ = 1, i.e. Xk = P > 0, and Xj = 0, j / k,j G 1, . . . , N. Let the channel 
be the Gaussian erasure channel, i.e. y = Hx + n, where H = diag(Jj), where 5i are i.i.d. Bernoulli 
random variables, and K n = o\l jy. Then the minimum error is given by 

E U+ il^N J > (50) 

where this optimum is achieved by any unitary matrix with entries ofk th column satisfying |iijfc| 2 = 1/iV, 
i = l,...,N. 

Proof: Let v = [v\, . . . , u n ] T , = |iifci| 2 > i = 1, . . . , N , where T denotes transpose. 

SMi + ^PtHtglh)-]-*! l N — ] = E[ 1 ]. (51) 

^ °n p + 2^i=l °i\ u ki\ p + 2-/t=l °i v i 

The proof uses an argument in the proof of [U Thm. 1], which is also used in [IT] . Let IL £ M. NxN 
denote the permutation matrix indexed by i = 1, . . . ,N\. We note that a feasible vector v satisfies 
Tl!i=i v i = 1> v i — 0> which forms a convex set. We observe that for any such v, weighted sum of all 
permutations of v, v = J2i^=i = (f v i)[^i • • • > 1] T = [ii ■ • > ^r] T G is a constant vector 
and also feasible. We note that q(v) = . * ; — 1 is a convex function of v over the feasible set. 

Hence g(v) > g{v) = g([l/N, . . . , 1/iV]) for all i>, and v is the optimum solution. Since there exists a 
unitary matrix satisfying \uik\ 2 = 1/N for any given k (such as any unitary matrix whose k th column 
is any column of the DFT matrix), the claim is proved. 



Lemma 2.6 Let K" 1 y 0. Instead of fixing the eigenvalue distribution, let us consider the relaxed 
constraint ti{K~ l ) = R. Let K n >- 0. Let the channel be the Gaussian erasure channel, i.e. y = Hx+n, 
H = diag(<5j), where 5i are i.i.d. Bernoulli random variables with probability of success p. Then 



argmin^rff.sdla: - #My]|| 2 ] 
K7 1 



argminEnKtr^- 1 + —H^ K~ l H)~ 1 ] 



K7 1 



(52) 



is a circulant matrix. 

Proof: The proof uses an argument in the proof of [6j Thm. 12], [5j. Let II be the following 
permutation matrix, 



n 



1 






1 ()••■ 



(53) 



We observe that II and II' (I th power of II) are unitary matrices. We form the following matrix 
lEilo 1 ^! 1 ^)^ which also satisfies the power constraint tr(^~ 1 ) = R. We note that 



K' 1 
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since K x >- 0, so is K x y 0, hence ^ is well-defined. 

JV-l 7V-1 

EKM-^n'if^n'it^fftf^^ < -^E[tr(n^ 1 (n i )t + ^ 1 ij)^] (54) 

«=0 n 1=0 n 

1 7V_1 1 
= n E ^ M^*- 1 + -(tf )t J fft^i^n')(rf)t)- 1 ] 

= -^^[^-^-(rf^t^i^n^ 1 ] (55) 

z=o n 

= ^E £ [ tr ^ 1 + /W] ( 56 ) 

z=o n 
= EMK^ + ^K^H)- 1 } (57) 

We note that tr((M + ET" 1 ) -1 ) is a convex function of M over the set M >- 0, since tr(M _1 ) is a 
convex function (see for example |27^ Exercise 3.18]), and composition with an affine mapping preserves 
convexity (27J, Sec. 3.2.2]. Hence the first inequality follows from Jensen's Inequality. (f55|) is due to 
the fact that H l s are unitary and trace is invariant under unitary transforms. (|56|) follow from the fact 
that HIi 1 has the same distribution with H. Hence we have shown that K x 1 provides a lower bound 
for arbitrary K~ l satisfying the power constraint. Since K~ l is circulant and also satisfies the power 
constraint tr (K" 1 ) = R, the optimum K~ l should be circulant. 

We note that we cannot follow the same argument for the constraint tr(K x ) = P, since the objective 
function is concave in K x over the set K x y- 0. This can be seen as follows: E\\\x — E[x\y]\\ 2 ] = 
ti(K e ), where K e = K x — K xy K~ l K xy . We note that K e is the Schur complement of K y in K = 
[K y K yx ;K xy K x ] , where K y = HK X H^ + K n , K xy = K X H^ . Schur complement is matrix concave in 
K >- 0, for example see |27l Exercise 3.58]. Since trace is a linear operator, tr(i^ e ) is concave in K. 
Since K is an affine mapping of K x , and composition with an affine mapping preserves concavity |27t 
Sec. 3.2.2], tr(iT e ) is concave in K x . 



Lemma 2.7 The DFT matrix is, in general, not an optimizer of Problem PI for Gaussian erasure 
channel. 

Proof: We provide a counterexample to prove the claim of the lemma: An example where a 
unitary matrix not satisfying \uij\ 2 = 1/N outperforms the DFT matrix. Let N = 3. Let = 
diag(l/6,2/6,3/6), and K n = 1. Let U be 



U 



Hence K r becomes 



K x 



l/y/2 l/y/2 

10 

-l/y/2 l/y/2 _ 



1/3 1/6 
1/3 
_ 1/6 1/3 



(58) 



(59) 



We write the average error as a sum conditioned on the number of measurements as J(U) = ^2\i=oP M (1- 
p) 3 ~ M cm{U), where eu denotes the total error of all cases where M measurements are done. Let 
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e(U) = [eo(C/),ei(J7),e 2 (C/),e 3 (U)]. The calculations reveal that e(U ) = [1,65/24,409/168,61/84] 
whereas e(F) = [1,65/24,465/191,61/84], where F is the DFT matrix. We see that all the entries 
are the same with the DFT case, except e2(t/o) < 62(F), where C2(Uq) = 409/168 « 2.434524 and 
e 2 (F) = 465/191 « 2.434555. Hence U outperforms the DFT matrix. 

We note that our argument covers any unitary matrix that is formed by changing the order of the 
columns of the DFT matrix, i.e. any matching of the given eigenvalues and the columns of the DFT 
matrix: Uq provides better performance than any K x formed by using the given eigenvalues and any 
unitary matrix formed with columns from the DFT matrix. The reported error values hold for all such 
K x . 



2.3 Rate-Distortion Bound 

We note that by combining the rate distortion theorem and the converse to the channel coding theorem, 
one can see that the rate-distortion function lower bounds the channel capacity for a given channel 
structure [28]. We now show that this rate-distortion bound is not achievable with the channel structure 
we have. 

We consider the scalar real channel: y = aua + n, where a = 1 with probability p, and a = with 
probability 1 — p. Let ua = x. Let a, and n be independent zero mean Gaussian random variables. 
When needed, we emphasize the random variables the expectations are taken with respect to; we denote 
the expectation with respect to the random channel gain by -E a [-]> and the expectation with respect 
to random signals involved (including x and n) by E s [.] Assuming the knowledge of realization of a at 
the receiver, but not at the transmitter, the capacity of this channel with power constraint P x < 00 is 
given by 

p 

C= max E a [I(x;y)] = max \pl(ua + n; x) + (1 — p)I(0; x)] = p 0.5 logfl H — ^-). (60) 

E 3 [x 2 ]<P x E s [x 2 ]<P x a£ 

Here we have used the fact that the capacity of an additive Gaussian channel with noise variance cr^ 
and power constraint P x is 0.51og(l + 

The rate-distortion function of a Gaussian random variable with variance is given as 

2 

R(D)= min I(a; a) = max{0.51og(^), 0}. (61) 

f&\ a , E[(a-a) 2 ]<D D 

We note that by the converse to the channel coding theorem, for a given channel structure with capacity 
C, we have R(D) < C, which provides D(C) < E[(a - a) 2 } [28]. Hence 

E ayS [(a-a) 2 } = pE a [(a- a) 2 \a = 1] + [1 - p) E a [(a - a) 2 \a = 0} (62) 

> pD(R) + (1 —p)D(R) (63) 
= a 2 a 2- 2R (64) 

> ^2- pl ° g(1+ t ) (65) 

= ^(^Ar) P (66) 

where we have used the fact that C(a) > R(D) for each realization of the channel, hence C = pC(a = 
1) + (1 — p)C(a = 0) > pR(D) + (1 — p)R(D) = R(D). On the other hand the average error of this 
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system with Gaussian input a, a^u 2 = a 2 = P x is 



E a , s [{a-af] = (l-p)al+p(al-^%) 

= (^-PK+P^K (68) 

x "t" 

We observe that (|68p is strictly larger than the bound in (|66p for < p < 1, cr^ > 0. (This follows 
from the fact that f(x) = b x , b ^ 0, 1 is a strictly convex function so that /((l — p)x\ + PX2) < 

2 

(1 —p)f(xi) +pf{x2) for < p < 1, x\ X2- Hence with 6 = ^°" p , < P x < oo, xi = 0, £2 = 1, the 
inequality follows.) 

3 Problem P2: Random Sampling/Support at a Fixed Measurement 
Domain - Error Bounds That Hold with High Probability 

In the previous section, we have focused on the average MMSE performance of random scalar Gaussian 
channel and Gaussian erasure channel. In this section we consider a closely related sampling strategy, 
and focus on MMSE bounds that hold with high probability. 

In this section, we assume that nonzero eigenvalues are equal, i.e. h. x ,B = t^|/| B |, where \B\ < N 
. We are interested in the MMSE estimation performance of two set-ups: i) sampling of a signal with 
fixed support at randomly chosen measurement locations; ii) sampling of a signal with random support 
at fixed measurement locations. We investigate bounds on the MMSE depending on the support size 
or the number of measurements. We illustrate how the results in matrix theory mostly presented in 
compressive sampling framework can provide error bounds for these scenarios. We note that there are 
studies that consider the MMSE in compressive sensing framework such as [181 [TO] , which focus on the 
scenario where receiver does not know the location of the signal support. In our case we assume that 
the receiver has full knowledge of signal covariance matrix. 

We again consider the set-up in ([1]). The sampling operation can be modelled with a M x N H 
matrix, whose rows are taken from the identity matrix as dictated by the sampling operation. We let 
Umb = HUb be the M x \B\ submatrix of U formed by taking \B\ columns and M rows as dictated 
by B and H, respectively. The MMSE can be written as (llip 



(70) 



E[\\x - E[x\y}\\ 2 } = trdA^ + ^U^HUBr 1 ) (69) 

\B\ 

\B\ 1 

£^ + £Ai(tWl/MB)' (71) 

We see that the estimation error is determined by the eigenvalues of the matrix U^ mb Umb- We note 
that many results in compressive sampling framework make use of the bounds on the eigenvalues of 
this matrix. We now use some of these results to bound the MMSE performance in different sampling 
scenarios. We note that different bounds found in the literature can be used, we pick some of the 
bounds from the literature to make the constants explicit. 
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Lemma 3.1 Let U be an N x N unitary matrix with yNmaxkj \uk,j\ = n(U). Let the signal have 
fixed support B on the signal domain. Let the sampling locations be chosen uniformly at random from 
the set of all subsets of the given size M . Let noisy measurements with noise power o~\ be done at these 
M locations. Then for sufficiently large M(fj,), the error is bounded from above with high probability: 

P +^ N 

More precisely, if 

M> \B\n 2 (U) max{d log \B\,C 2 hg(3/S)) (73) 
for some positive constants C\ and C%, then 

I B I 

Pie > ,_, 1 1 ) < 5. (74) 

v — \B[ , 1 Q.5M ' — v ; 

P N 

In particular, when the measurements are noiseless, the error is zero with probability at least 1 — 5. 

Proof: We first note that \\Umb^Umb — I\\ < c implies 1 — c < \i(Umb Umb) < 1 + c. Consider 
Theorem 1.2 of [2]. Suppose that M and \B\ satisfies (|73p . Now looking at Theorem 1.2, and noting 
the scaling of the matrix U ] U = NI in [2] , we see that P(0.5^ < K{U M b^U M b) < 1-5^) > 1 - S. 
By (f7T|) the result follows. 

For the noiseless measurements case, let A„2 be the event \e < o~1 — rwr^rmr} Hence 

u n p "r ff 

lim P(A a2 J = lim E[l Aa2 ] (75) 

= ^[^1a 2 ] (76) 

= i°(e = 0) (77) 

where we have used Dominated Convergence Theorem to change the order of the expectation and the 
limit. By (|74|) P(A a 2) > 1 — S, hence P(e = 0) > 1 — 5. We also note that in the noiseless case, it is 

enough to have A m ; n (?7|^ B ?7Ms) bounded away from zero to have zero error with high probability, the 
exact value of the bound is not important. 

We note that when other parameters are fixed, as maxj.j \ u k,j \ gets smaller, fewer number of samples 
are required. Since sJl/N < maxfc j \ukj\ < 1 , the unitary transforms that provide the best guarantees 
are the ones satisfying \ukj\ = yJl/N . We note that for any such unitary transform, the covariance 
matrix has constant diagonal with (K x )a = P/N regardless of the eigenvalue distribution. Hence 
with any measurement scheme with M noiseless measurements, the reduction in the uncertainty is 
guaranteed to be at least proportional to the number of measurements, i.e. the error satisfies e < 
p _ M p 

N ' 

We now consider a signal sampled at fixed locations with random support uniformly chosen from 
the set of supports with a given size. We note that in this case the results, such as Theorem 12 of [3] or 
Theorem 2 of [29] (and the references therein ) that explores the bounds on the eigenvalues of random 
submatrices obtained by uniform column sampling can be used for bounding the estimation error. We 
assume that the receiver has access to the support set information. In the following we assume the field 
is real, i.e. x € M. N and y G M M . The s.v.d. of K x is given as K x = UK X U\ where U is orthonormal, 
i.e. U S W NxN , WU = In. We note that normalized Hadamard matrices satisfy |iijj| 2 = jj and 
orthonormal as required in the lemma. For the proper complex Gaussian case the argument is similar, 
and Theorem 12 of [3] can be used. 
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Lemma 3.2 Let U be a N x N orthonormal matrix such that \uij\ 2 = jf. Let the M locations at the 
measurement domain be fixed, and let H be the M x N diagonal matrix. Let /i be defined by 

N 

fi = -max\(HU) J ,(HU) k \, (78) 

where (HU)j denotes the j th column of HU. Let the support of the signal be chosen uniformly from 
the set of all subsets of the given size \B\ < N. Then for sufficiently small \B\, the error is bounded 
from above with high probability 

\B\ 



£ < T5i — (79) 

p " N 

with r £ (0, 1). More precisely, let a > 1, and n > 1. Assume that \i < ^ N and \B\ < N ^^^ log N 

With °B < 3?lfi^ and C » ^ Jillr^ ^- TheH 

P(e > -Jgl — ) < 2592AT- (80) 
~r + ^ ~ r )^l^ 

In particular, when the measurements are noiseless, the error is zero with probability at least 1 — 
2592iV~ a . We note that as observed in J2§$ , it is sufficient to have a\ogN > 8 to ensure that the 
probability bounds are non-trivial. 

Proof: We note that X = ^J~^jHU has unit norm columns and fi given in (|78p is the coherence of 
X as defined by equation [1.3] of [29]. We also note that HU is full rank, that is rank of HU is equal to 
largest possible value i.e. M, since U is orthogonal. We also note that = \ \^J~^HU\ \ = \J~^\\HU\\. 

Hence we can use Theorem 2 of [29] to bound the singular values of \ ^jHUb- As in the proof of the 



previous lemma, the result follows from (I7ip . The noiseless case follows similar to the previous lemma. 
Again it it is enough to have X m in(ul^ B UMB) bounded away from zero to have zero error with high 
probability. 

We note that the conclusions derived in this section are based on high probability results for the 
norm of a matrix restricted to random set of coordinates. We note that for the purposes of such 
results, the uniform random sampling model and the Bernoulli sampling model where each component 
is taken independently and with equal probability is equivalent [T], [HJ [30] . For instance, the derivation 
of Theorem 1.2 of [2], the main step of Lemma 13.11 is in fact based on a Bernoulli sampling model. 
Hence the high probability results presented there also hold for Gaussian erasure channel of Section [2] 
(with possibly different parameters). 



4 Problem P3: Random Projections - Error Bounds That Hold With 
High Probability 

In this section we consider the measurement strategy where M random projections of the signal are 
taken, the measurement system matrix H is a M x N, M < N matrix with Gaussian i.i.d. entries. In 
this section we assume that the field is real. We also assume that A x is positive-definite. 

We note that the matrix theory result used in this section is novel, and provides fundamental 
insights into problem of estimation of signals with small effective number of degrees of freedom. In the 
previous section we have used some results in compressive sensing literature that are directly applicable 
only when the signals are known to be exactly sparse (some of the eigenvalues of K x are exactly equal 
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to zero.) In this section we assume a more general eigenvalue distribution. Our result enables us 
draw conclusions when some of the eigenvalues are not exactly zero, but small. The method of proof 
provides us a way to see the effects of the effective number of degree of freedom of the signal (A x ) and 
the incoherence of measurement domain (HU), separately. 

Before stating our lemma, we now make some observations on the related results in random matrix 
theory. Consider the submatrices formed by restricting a matrix K to random set of its rows, or 
columns; R\K or KR2 where R\ and R2 denote the restrictions to rows and columns respectively. The 
main tool for finding bounds on the eigenvalues of these submatrices is finding a bound on E\\R\K — 
E[R\K\\ \ or E\\KR\ — E[K fin]||[3ll31|l29]. In our case such an approach is not very meaningful. The 
matrix we are investigating A" 1 + (HU)^ (HU) constitutes of two matrices: a deterministic diagonal 
matrix with possibly different entries on the diagonal and a random restriction. Contrary to a sole 
random restriction, this matrix does not stay around its mean. Hence we adopt another method: the 
approach of decomposing the unit sphere into compressible and incompressible vectors as proposed by 
M. Rudelson and R. Vershynin [32]. 

We note that when the eigenvalues of K x have rectangular spread, using the method in Lemma 13. II 
and for example using Proposition 2.5 of [32], [33], one can prove that it is possible to achieve low 
values of MMSE with high probability also for random projections. Here we focus on the case where 
A x y to see the effects of other eigenvalue spreads. We also note that the general methodology in 
this section can be extended to the case where H has complex entries. In this case the channel will be 
a Rayleigh fading channel. 

We consider the general measurement set-up in (TjQ) where y = Hx + n, with K n = cr^I, K x y 0, 
and assume the field is real, i.e. x G M. N and n G M. M . The s.v.d. of K x is given as K x = UA X W, 
where U G M. NxN is orthonormal and A = diag(Aj) with £^ Aj = P, Ai > A2, • • • , > Xn- 



Theorem 4.1 Let H be a M x N , M < N , M = f3N matrix with Gaussian i.i.d. entries with 
variances a 2 H at least 1. Let D(5) be the smallest number satisfying \ — where 5 G (0,1]. 

Assume that D(5) + M < N , and Aj < C\jj, i = D + 1, . . . , N . Then there exist C , C\, T , T\ that 
depend on a^, C\, ft such that if D{8) < T, and M > T\ the error will satisfy 

P(E[\\x - E[x\y]\\ 2 ] > (1 - S)P + ^^4A P ) < e ~° lN ( 81 ) 



Remark 4.1 As we will see in the proof, the eigenvalue distribution plays a key role in obtaining 
stronger bounds: In particular, when the eigenvalue distribution is spread out, the theorem cannot 
provide bounds for low values of error. As the distribution becomes less spread out, stronger bounds are 



obtained. We discuss this point in Remark 7.1, Remark 7.2, and Remark \ 7. 3[ Effect of noise level is 
discussed in Remark \7.4\ 

Proof: Let the eigenvalues of a matrix A be denoted in decreasing order as Ai(^4) > A2CA), . . . , > 
X N (A). 

We note that by [Lemma 5 , [I]], H and HU have the same probability distribution. Hence we can 
consider H instead of HU in our arguments. The error can be expressed as (|lip 

E[\\x - E[x\y)\f] = trttA-i + i^r 1 ) (82) 



N 



SwT^i) (83> 
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N-M-D N 
N-M-D 1 N 1 

" £ WA?) + i= J£d + i MA? + ^fltHj (85) 

N-M-D 

< £ A JV - W i(A,) + (M + g) - i m (86) 

i J. l> n 

JV-M . 

= E MA.) + (M + J p) (A -i + irtm (87) 

where the first inequality follows from case (a) of Lemma 12.21 and the fact that H^H is at most rank 
M. 

Hence the error is bounded as 

N—M 

E[\\x - E[x\y}\\ 2 } < £ A,(A.) + (M + £>) 1 (88) 

i=D+i A min (A x +^WH) 

' l A m j n (A x 1 + -k-H^H) 

The smallest eigenvalue of A" 1 + -^H^H is sufficiently away from zero with high probability as 
noted in the following lemma: 

Lemma 4.1 Let H be a M x iV, M < A?" matrix with Gaussian i.i.d. entries. Assume that the 
assumptions of Theorem \4-l\ holds. Then with the conditions stated in Theorem \4-l\ the eigenvalues of 
A^ 1 + H are bounded from below as follows: 

P( inf x^AZ l x + -^x^H^Hx < C-) < e~ ClN . (90) 



Here S 1 denotes the unit sphere where x £ S N 1 if x G f w , and \\x\\ = 1. 
The proof of this lemma is given in Section 17.31 of the Appendix. 

We now know that P(A min (A" 1 + XrfH) > C%) > 1 - e~ c ' N , and hence P(- — ^ , -r— < 

ij) > l-e" 01 ^. Together with the error bound in (J89|), we have P(E[\\X - E[X\Y}\\ 2 ] < (1-5)P + 
YU±Dpj > 1 _ g-CiN^ and the regult f n ows . ■ 

5 Problem P4: Equidistant Sampling of Circularly Wide-Sense Sta- 
tionary Random Vectors 

We now consider the MMSE associated with equidistant sampling of an important class of signals: 
circularly wide-sense stationary (c.w.s.s.) signals, which is a way for modelling wide sense stationary 
signals in finite dimension. Let x = [xt, t £ I = 0, . . . , N — 1] be a zero-mean, proper, c.w.s.s. Gaussian 
random vector. We note that the covariance matrix of a c.w.s.s. signal is always circulant, so the 
eigenvectors of the covariance matrix is given by the columns of the DFT matrix utk = 77^ e,J "^ ''■> 
where < t ,k < N — 1 |20j . Hence in this section we fix the unitary transform to be the DFT matrix. 
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We denote the associated eigenvalues with Afc, < < iV — 1 instead of indexing the eigenvalues in 
decreasing/increasing order. 

We note that since the columns of the DFT matrix satisfy \u t k\ = -7=, the results of Section [3] are 
applicable to c.w.s.s. signals. In particular, by Lemma 13.11 we conclude that c.w.s.s. signals provide 
good performance with high probability for the case of random measurement locations with fixed 
support. Hence among signals with a covariance matrix with a given rectangular eigenvalue spread, 
c.w.s.s. signals are among the ones that can be estimated with low values of error with high probability 
with a given number of measurements. 

In this section, we consider the noiseless deterministic sampling strategy where every 1 out of AN 
samples are taken. We let M = 6 Z, and assume that the first component is always measured for 
convenience. Hence our measurements are in the form 

y = Hx, (91) 

where H E jj Mx N j s sam pli n g matrix formed by taking the rows of the identity matrix corresponding 
to the observed variables. 

We now present our main result in this section; an explicit expression and an upper bound for the 
mean-square error associated with the above set-up. 

Lemma 5.1 Let the model and the sampling strategy be as described above. Then the MMSE of 
estimating x from these equidistant samples can be expressed as 

AiV— 1 AN-l >2 
E[\\x - E[x\y}\\ 2 } = ^(E X ^+k~ £ ^ AN T* ). (M) 

fee Jo i=0 i=0 2^1=0 Mhl+k 

where J = {k : E^" 1 Wfc + 0, < k < M - 1} C {0, . . . , M - 1}. 

In particular, choose a set of indices J C {0, 1, . . . ,N — 1} with \J\ = M such that 

jM + keJ^iM + k^J Vi,j, < i, j < AN -l,i^j (93) 

with < k < M — 1. Let Pj = EieJ Then the MMSE is upper bounded by the total power in the 
remaining eigenvalues 

E[\\x-E[x\y}\\ 2 ] < 2(P-Pj). (94) 
In particular, if there is such a set J so that Pj = P, the MMSE will be zero. 

Remark 5.1 The set J essentially consists of the indices which do not overlap when shifted by M. 

Remark 5.2 We note that the choice of the set J is not unique, and each choice of the set of indices 
may provide a different upper bound. To obtain the lowest possible upper bound, one should consider 
the set with the largest total power. 

Remark 5.3 If there exists such a set J that has the most of power, i.e. Pj = 5P, 5 £ (0, 1], with 
5 close to 1, then 2(P — Pj) = 2(1 — 5)P is small and the signal can be estimated with low values of 
error. In particular, if such a set has all the power, i.e. P = Pj, the error will be zero. A conventional 
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aliasing free set J may be the set of indices of the band of a band-pass signal with band smaller than 
M. It is important to note that there may exist other sets J with P = Pj, hence the signal may be 
aliasing free even if the signal is not bandlimited (low-pass, high-pass etc) in the conventional sense. 
Proof: Proof is given in Section 17.41 of the Appendix. 

We observe that bandwidth W (or DOF) turn out to be good predictors of estimation error for this 
case. On the other hand, the differential entropy of an effectively W-bandlimited Gaussian vector can 
be very small even if the bandwidth is close to N, hence may not provide any useful information with 
regards to estimation performance. 

We now compare our error bound with the following results where the signals defined on M. are 
considered: In [M], mean-square error of approximating a possibly non-bandlimited wide-sense sta- 
tionary (w.s.s.) signal using sampling expansion is considered and a uniform upper bound in terms of 
power outside the bandwidth of approximation is derived. Here we are interested in the average error 
over all points of the N dimensional vector. Our method of approximation of the signal is possibly 
different, since we use the MMSE estimator. As a result our bound also makes use of the shape of the 
eigenvalue distribution. [35] states that a w.s.s. signal is determined linearly by its samples if some set 
of frequencies containing all of the power of the process is disjoint from each of its translates where the 
amount of translate is determined by the sampling rate. Here for circularly w.s.s. we show a similar 
result: if there is a set J that consists of indices which do not overlap when shifted by M, and has all 
the power, the error will be zero. In fact, we show a more general result for our set-up: we show that 
two times the power outside this set J provides an upper bound for the error, hence putting a bound 
on error even if it is not exactly zero. 

6 Discussion and Conclusions 

We have considered the transmission of a Gaussian vector source over a multi-dimensional Gaussian 
channel where a random or a fixed subset of the channel outputs are erased. We have considered the 
setup where the only encoding operation allowed is a linear unitary transformation on the source. We 
have investigated the MMSE performance both in average and in terms of guarantees that hold with 
high probability as a function of system parameters. We have assumed that the receiver knows the 
channel realization. 

In addition to providing insights into the problem of unitary encoding in Gaussian erasure channels, 
our work also contributed to our understanding of the relationship between the MMSE and the total 
uncertainty in the signal as quantified by information theoretic measures such as entropy (eigenvalues) 
and the spread of this uncertainty (basis). We believe that through this relationship our work also sheds 
light on how to properly characterize the concept of "coherence" . Coherence, a concept describing the 
overall correlatedness of a random field, is of central importance in statistical optics; see for example 
\36\ [37] and the references therein. 

We have first considered random channels and focused on the average performance. We have 
considered two channel structures: i) random Gaussian scalar channel where only one measurement is 
done through Gaussian noise and ii) Gaussian erasure channel where measurements are done through 
parallel Gaussian channels with a given channel erasure probability. Under these channel structures, we 
have formulated the problem of finding the most favorable unitary transform under average performance 
criterion. We have investigated the convexity properties of this optimization problem, and obtain 
conditions of optimality through variational equalities. We were not able to solve this problem in its 
full setting, but we have solved some related special cases. Among these we have identified special 
cases where DFT-like unitary transforms (unitary transforms with \u{j\ 2 = jj) turn out to be the best 
coordinate transforms, possibly along with other unitary transforms. Although these observations and 
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the observations of Section [3] (which are based on compressive sensing results) may suggest the idea 
that the DFT matrix may be indeed an optimum unitary matrix for any eigenvalue distribution, we 
have provided a counterexample. 

In Section [3] and Section U we have illustrated how some recent results in matrix theory mostly 
presented in compressive sampling framework can be used to find performance bounds for the MMSE 
estimation. In this part we have provided performance guarantees that hold with high probability. We 
have considered three set-ups: i) sampling of a signal with fixed support at uniformly random chosen 
measurement locations at a fixed domain; ii) sampling of a signal with uniformly random support at 
fixed measurement locations at a fixed measurement domain; iii) random projections (random channel 
matrix with i.i.d. Gaussian entries) where the eigenvalue distribution of the covariance matrix is 
arbitrary. For the first two cases, we have investigated bounds on the MMSE depending on the support 
size and the number of measurements. For the third case, we have illustrated the interplay between the 
amount of information in the signal, and the spread of this information in the measurement domain 
for providing performance guarantees. 

Finally we have considered circularly wide sense stationary signals, which is a natural way to model 
wide sense stationary signals in finite dimension. In this section the covariance matrix was circulant 
by assumption, hence the unitary transform was fixed and given by the DFT matrix. We have noted 
that the results of Section [3] are applicable to c.w.s.s. signals. For instance, when these signals have a 
flat nonzero eigenvalue spectrum, they can be estimated with zero MMSE with high probability with 
a given number of noiseless measurements whose locations are chosen uniformly random. In this part, 
we have focused on equidistant sampling and gave the explicit expression for the MMSE. We have also 
shown that two times the total power outside a properly chosen set of indices (a set of indices which 
do not overlap when shifted by an amount determined by the sampling rate) provides an upper bound 
for the MMSE. We have observed that the notion of such a set of indices generalizes the conventional 
sense of bandlimited signals. Our results showed that the error will be zero if there is such a set of 
indices that contains all of the power even if the signal is not band-limited (low-pass, high-pass) in the 
conventional sense. 



7 Appendix 

7.1 Proof of Lemma 12.11 

The left hand side of the unitary matrix constraint in (|16f) may be rewritten as 

4(tf B U B -I\B\)e k = eJ((U B ,R+jUB,i)HUB,R+jUB,i)-I\B\)e k (95) 
= eJttUlR-jUljXUB^ + jUB^-I^ek (96) 
= e I( U B,R U B,R + U RiI U B>I )e k + jej (Ub,rUb,i - UBjU B>R )e k - eJl\ B \e k . 

Hence the constraint becomes 

£(Ub,rUb,r + UB,iUB,i)e k + jg{U]i )R UB,i - UljU BR )e k = ejl m e k (97) 

By considering the real and imaginary parts of the equality separately, these constraints may be ex- 
pressed as 

^i(Ul tR U B , R + UljU B j)e k = ejl m e k , (i,k)€-y (98) 
eJ(U^ )R U BJ -UljU BtR )e k = 0, (i, k) G 7 (99) 
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where 7 = {(i, k)\i = 1, . . . , |JB|, k = 1, . . . ,i}, and 7 = {(i, k)\i = 1, . . . , \B\, k = 1, — 1}. For 
the i = k case, we only consider the real part of the constraint since the imaginary part necessarily 



vanishes, i.e. el(U B Ug)e 



,1, 



u\Ui G R. 

The set of constraint gradients with respect to 

{\UB, R ^ + e k eJ) } |( jt) |m 
U U B)I { ei el + e k ej) J lv ; '/U 

where we have used the following identities [38] 

d(tr(A!X T ^ 2 )) = 



can be expressed as 



U B ,i(-eiel + e k e] 
UB,R(eiel - e k ej 



d(tr(A^XAj)) 

tr(AjdXAj) 

ti(AjAjdX) 



\(i,k) G 7 



and 



d(tr(X T ^ 2 ^i)) 



d(tr(XA 1 X T A 2 )) 
tr(dXA 1 X T A 2 + XA 1 d(X T )A 2 ) 
tr{A 1 X T A 2 dX + d{X T )A 2 XA 1 ) 
triA^A^X + AjX T A^dX) 



(100) 



(101) 
(102) 
(103) 



(104) 
(105) 
(106) 
(107) 



where X is the matrix variable defined on real numbers and A± and A 2 are constant real matrices. 
For instance, with Ub,r as the variable d(tr(ej (Ug R UB,R,)e k )) = d(tr(Ug R UB,Re k eJ) = tr((eje£ + 
e keJ)UB R dU B ,R) with A\ = e k ej , and A 2 = I N . 

The linear independence of the elements of this set follows from the following fact: For any matrix 



U B G 



^NxB 



satisfying U b IIb = I\ B \, the matrix Ub 



U b ,r 
U B j 



-U b ,i 
U B r 



G R 2 ^x 2B satisfies mfi B 



^2\B\ [I]- Hence the columns of Ub form an orthonormal set of vectors. We observe that the elements of 
the constraint gradient set given in (jlOOp are matrices with zero entries except at k th and i th columns, 



k) column(s), we have columns from U B . For instance consider 
for some (i,k) G 7, and let i 7^ k. This is a matrix of zeros except at k th 



where at these two (or one if % 

U B ,R( e i e k + e keJ) 
UB,i(eieJ + e k ej) 

column we have i th column of Ub and at i th column we have k th column of Ub- Now since Ub has 
orthonormal columns, it is not possible to form the values at k th and i th columns using other columns 
of Ub, and hence other elements of the set given in (jlOOp . Similar arguments hold for all the other 
elements of the set in ()100p . Hence the constraint gradients are linearly independent for any matrix 



U B G C NxB satisfying U B U B = I\ 



L\B\- 



7.2 Lagrangian for optimizing a real valued function of a matrix variable with 
complex entries under equality constraints 

We now clarify the form of the Lagrangian in (|22p . Let /o(^b) be the function to be optimized with 
complex equality constraints fi, k (U B ) = G C , (i, k) G 7, with |7| = N\ = 0.5\B\(\B\ — 1) and the real 
equality constraints h k (UB) = G R, k = 1, . . . , N 2 = \B\. The N\ complex equality constraints can be 
expressed equivalently as 2N\ real equality constraints 5ft{/i,fe(J/B)} = G E, and ^${fi )k (U B )} = G R 
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for (i, k) G 7. Then the Lagrangian can be expressed as 

^2 

L(U B ,u,v) = f (U B )+ Y "i,h,i&{fi,k{UB)}+ Y ^kMkk(UB)} + J^v k h k (U B \lQS) 

(i,k)ej k=l 

= fo(U B )+ Y ^Hk{fi,k(Us)}} + J £ / v k h k (U B ) (109) 

(i,k)ej k=l 

N 2 

= fo(U B ) + 0.5 n,kfi,k{U B ) + ^ Y Kkftk(U B ) + Y v kh k (U B ) (110) 

(i,fc)S7 (i,fe)67 k=l 

where Vi k G C, with ^{f^fc} = v i,k,Ri ^{^i.fc} = v i,k,ii and v k G M. are Lagrange multipliers. Now 
m is obtained with f (U B ) = E^tr^A^ + ^B^fc^) -1 ), /U(#b) = e^t^C^ - /|B|)e fc , 

B \)e k and absorbing any constants into Lagrange multipliers. 

7.3 Proof of Lemma [4.11 

Our aim is to show that the smallest eigenvalue of A = A" 1 + \H^H is bounded from below with a 
sufficiently large number with high probability. That is we are interested in 

inf x^K-. l x + \x ] H ] Hx (111) 

To lower bound the smallest eigenvalue, we adopt the approach proposed by |32j : We consider the 
decomposition of the unit sphere into two sets, compressible vectors and incompressible vectors. We 
remind the following definitions from |32j . 



Definition 7.1 [pg.14, ]32j] Let \supp(x)\ denote the number of elements in the support of x. Let 
rj,p G (0, 1). x G M N is sparse, if \supp(x)\ < nN. The set of vectors sparse with a given r\ is denoted by 
Sparse(n). x G S 1 ^" 1 is compressible, if x is within an Euclidean distance p from the set of all sparse 
vectors, that is By G Spars e(rj),d(x,y) < p. The set of compressible vectors is denoted by Comp(rj, p). 
x G S N_1 is incompressible if it is not compressible. The set of incompressible vectors is denoted by 
Lncomp(rj, p). 



Lemma 7.1 [Lemma 3.4, 132]] Let x G Lncomp(rj, p). Then there exists a set of ip C 1,...,N of 
cardinality > 0.5p 2 r/N such that 



< \x k \ < 



1 



for allk G tp 



(112) 



We note that the set of compressible and incompressible vectors provide a decomposition of the 
unit sphere, i.e. S*^ -1 = Lncomp(r], p) [J Comp(r], p) [32j . We will show that the first/second term in 
(jllip is sufficiently away from zero for x G Incomp(r), p)j x G Comp(r], p) respectively. 

As noted in [32] 

P( inf x ] Ax < CqN) < P( inf x ] Ax < C N) + P( inf x ] Ax < C N) (113) 

x£S N—1 xdComp(r},p) x&Incomp(r),p) 
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We also note that 

inf x ] K~ 1 x + x ] \h ] Hx> inf x j A~ 1 x= inf ||A- 1/2 x|| 2 (114) 

x£lncomp(r),p) 0~ n x£lncomp(r],p) x£lncomp(r],p) 

and 

inf x^A^x + x^H^Hx > inf x^H^Hx = -=■( inf ||fl"a;|| 2 ) (115) 

x£Comp(rj,p) 0~ n x£Comp(r],p) 0~ n 0~ n x£Comp(r],p) 

where inequalites are due to the fact that A" 1 , H^H are both positive-semidefinite. 
We first consider the following special case of [32^ Lemma 3.3]: 



Lemma 7.2 \3S\ Lemma 3.3] Let H be a M = (3N x N random matrix with i.i.d Gaussian entries 
with variances at least 1. Then there exist rj,p, C-2,C\ > that does not depend on N such that 

P{ inf \\Hx\\ 2 <C 2 N) <e~ ClN (116) 

xdComp(r),p) 

To see the relationship between the number of measurements and the parameters of the lemma, we 
take a closer look at the proof of this lemma: We observe that here H is a M = f3N x N matrix, hence 
[32\ Proposition 2.5 ] requires rjN < 5qM where < 5q < 0.5 is a parameter of \32\ Proposition 2.5 ]. 
Hence M should satisfy M > T' where T' = j-rjN. 

We now look at '^-xeincampi^p) ll-A-a: a?|| 2 - We note that none of the entities in this expression is 
random. We note the following 

N 1 

inf ||A- 1/2 x|| 2 = inf V^M 2 (117) 

i=l 

where the inequality is due to Lemma f7.ll We observe that to have this expression sufficiently bounded 
away from zero, the distribution of -v- should be spread enough. 

Different approaches to quantify the spread of the eigenvalue distribution can be adopted. One 
may directly quantify the spread of j- distribution, for example by requiring [^-, . . . , j^]/ Yli X 7 e 
Lncomp(fj, p), where fj, p are new parameters. Since it is more desirable to have explicit constraints on 
the Aj distribution itself instead of constraints on the distribution of we consider another approach. 

Let us assume that Aj < C\j^, for i > n\if)\, where k G (0, 1), < C\ < oo. Then we have 

inf HA-V^H 2 > (119) 

x£lncomp( , n,p) Aj ZN 

> (H-*H)^pY (120) 

> (l- K )oV^^y (121) 
= (l- K )Q.2f>p A n-^pN (122) 
= ^C 3 N (123) 
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where we have used \ip\ > 0.5p 2 r]N. Here C3 = (1 — k)0. 25 p A r\-^. 

We will now complete the argument to arrive at P(inf xeS N-i x*Ax < ) < e~ ClN as claimed 
in the Lemma we are proving, and then discuss the effect of different eigenvalue distributions, noise 
level and M on this result. Let C = Pmm{\C 2 , 4C 3 ) = min(4C 2 , C 3 ). By (fTT4|) and (fT23l) . 

<J. n 1 (T n 

Pi^elncomp^^Ax < C%) = 0. By JTT5J, LemmafiM P(M xeComp{r , !p) xUx < C%) < e~ c ^ N . 
The result follows by (jTT3"|) . 

Up to now, we have not considered the admissibility of C to provide guarantees for low values 
of error. We note that as observed in Remark 17. 1\ and Remark 17.21 the error bound expression in 
Theorem 14.11 cannot provide bounds for low values of error when the eigenvalue distribution is spread. 
Hence while stating the result of Lemma 14. 1\ hence Theorem 14. T\ we consider the other case, the case 
where the eigenvalue distribution is not spread out, as discussed in Remark 17.31 



Remark 7.1 We note that as C = Pmin(-^-C2, -pCs) = min(-^-C2, C3) gets larger, the lower bound 
on the eigenvalues of A" 1 + -^-H^H gets larger, and the bound on the MMSE (see for example (J89|) ) 
gets smaller. To have guarantees for low values of error for a given M , we want to have have C as 
large as possible. For a given number of measurements M , we have a C 2 and associated n, p, C\ . For 
a given P and a\, to have guarantees for error levels as low as this C 2 , P and cr 2 permit, we should 
have < C3 so that the overall constant is as good as the one coming from Lemma \7.2\ We note 

that to have C3 large, C\ must be small. 



Remark 7.2 Let us assume that all the eigenvalues are approximately equal, i.e. |Aj — ^| < 
q G [0, 1] where q is close to 0. We have 

_1 N p 

T 



inf ||A^ 1/2 x|| 2 > (124) 
xelncompfap) rr^ 1 + q P 2N 



1 v 

1 -0.25p 4 nN^-, (126) 



l + q ' ' P 1 

Hence C 3 = j^0.25p 4 r] > 0. In this case (|89p will not provide guarantees for low values of error. In 
fact, the error may be lower bounded as follows 

E[\\x - E[x\y]\\ 2 ] = trdA-' + ^H)- 1 ) (127) 

n 

N 1 

= y rA — i — (128) 

^ ^i{Ax + ^WH) 

N M 

= V - + V - (129} 

N 1 M 1 

" J£ +1 + § A,(A^ + J&W) ' (130) 

N M 



E A^+M+^A^ + f; 1 (131) 

i=M+l i=l Ai[ - Ax +^ H]H > 
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N M 

= £ MA x) + V (132) 

i=M+l i=l A ^ A;C +^ H]H > 

N-M 1 , , 

> a -a— p + E A-(A , 1 + ^ tg) (133) 

where in (|130p . we /wroe used case (7>j o/ Lemma \2.2\ and the fact that H^H is at most rank M. We 
note that as q gets closer to 0, the first term gets closer to P. 

Remark 7.3 Let D(5) be the smallest number satisfying ^2d = i \ > SP, where 5 £ (0, 1]. Let D(5) = 
aN ' , a G (0, 1]. Let D(5) be sufficiently small for 5 sufficiently large, more precisely D(5) = aN < K\ip\, 
k G (0, 1), Aj < C\jj, for i > n\tp\ with C\ = q ^Z^) > w ^ 1 > 9 > 0. Hence we have Aj < q j^E^m > 
i > KaN . We observe that other parametes fixed, as admissible a > gets closer to 0, or 5 > gets 
close to 1, C\ gets smaller as desired. We note that the inequality D{5) < 0.5Kp 2 r]N = T together 
with the inequality M > T' = j^r]N relates the spread of the eigenvalues to the admissible number of 
measurements. 



Remark 7.4 We now discuss the effect of noise level. We note that the total signal power is given 
by tr(K x ) = P, whereas each measurement is done with noise whose variance is a 2 . We want to have 
C = Pmin(^-C2, -5C3) = minf^-C^, C3) as large as possible. Let us assume that other parameters of 

the problem are fixed and focus on the ratio . For constant P, as noise level increases, decreases. 
After some noise level, the minimum will be given by -^-C^. Hence the lower bound on the eigenvalues 
of A^ 1 + -j- W H will get smaller, and the upper bound on the MMSE will get larger. Hence Theorem \4-l\ 
will not provide guarantees for low values of error for high levels of noise. 

7.4 Proof of Lemma 15.11 

We remind that in this section utk = ~^ e ^ tk '1 < t ,k < N — 1 and the associated eigenvalues 
are denoted with A^ without reindexing them in decreasing/increasing order. We first assume that 
K y = E[yy*] = HK X H^ is non-singular. The generalization to the case where K y may be nonsingular 
is presented at the end of the proof. 

The MMSE error for estimating x from y is given by [21| Ch.2] 

E[\\x - E[x\y]\\ 2 ] = tv{K x -K xy K- x Kl y ) (134) 
= tr(UA x tf -UAxU^H^HUAxU^H^HUAxU^) (135) 
= tr(Aa; - A X U^ H\HU A X U^ H^y 1 HU A x ). (136) 

We now consider HU € C MxN , and try to understand its structure 

{HU)lk = _Le3£(AM)fc = _L e m » (137) 
v/JV v N 



where < I < — 1, < k < N — 1. We now observe that for a given I, e- 7 is a periodic function 
of k with period M = ^77 • So I th row of HU can be expressed as 

(HU) t = _L [eJ$l[°-"-i]] (138) 



^r e ^«[o...M-i], _ u%i[o...M-i], (139) 
N 
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Let U M denote the M x M DFT matrix, i.e. {U M )ik = ^m e3w with < / < M - 1, < k < M- 
Hence HU is the matrix formed by stacking AN M x M DFT matrices side by side 



1. 



HU 



y/AN 



[Um\...\U m ]- 



Now we consider the covariance matrix of the observations K„ = HK T H^ 



(140) 

HUAM^Hl We first 



express A x as a block diagonal matrix as follows 



A, 



Ao 




Hence A x = diag(A^.) with A* 



••• 

Ai ! 

■ ■ ■ Ajv-i 
diag(A iM+fc ) e M MxM 



A 






A 1 



(141) 



... o A A7V-1 

where < i < AN - 1, < k < M - 1. We 



can write K y as 



HUA X U ] H ] 

-j}=[U M \...\U h Mmg{N i x ) 

AN— l 

- U M { £ Aipl 



U 



A I 



u 



A I 



y/AN 



AN 



AI 



(142) 
(143) 

(144) 



i=0 



We note that Ei=o 1 A * e R MxM is formed by summing diagonal matrices, hence also diagonal. 
Since Um is the M x M DFT matrix, K y is again a circulant matrix whose k th eigenvalue is given by 
~Kn SiL^o" 1 AjM+fe- Hence = UmA v U m is the eigenvalue-eigenvector decomposition of where 



^ = Si=o 1 A x = diag(A y> A : ) with X V:k = Y,t=o ' ^iAi+k, < k < M — 1. We note that 



^AN—l 



A Y = SvEit'o 'A 

there may be aliasing in the eigenvalue spectrum of K y depending on the eigenvalue spectrum of K x 
and AN. We also note that K y may be aliasing free even if it is not bandlimited (low-pass, high-pass, 
etc.) in the conventional sense. 



Now K y 1 can be expressed as 



(UMAyUlr 1 

£/ M diag(-!-)[/ 



- M 



U M diag( 



AN 



E 



-) U M- 



t=0 



A, 



(145) 
(146) 

(147) 



AI+k 



We note that since K y is assumed to be non-singular, > 0. We are now ready to consider the error 
expression m (fT5B]> . We first consider the second term tr {A X U^ H^ K~ l HU A x ) 



tr( 



AOr/t 

1 ^x u M 



A^-if/t, 



(C/mA 



1 C/ t )— — 



[U m A° x \...\UmA£ N - 1 ]) 



(148) 
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AJV-l 

52 — -tr^A^) (149) 



AiV 

8=0 

AN-1M-1 ^2 



E E v a^? d50) 

i=0 fc=0 2^Z=0 A «M+fe 



Hence the MMSE becomes 

N-l AJV-1M-1 -,2 

= E A *- E E v a^ + ; 

t=0 i=0 k=Q 2^1=0 A lM+k 

= E E w - E E V A^t fc d52) 

fc=0 i=0 i=0 fc=0 2^=0 MM+k 

M-l AN-l AN— I X 2 

= E(E w- E ^anT" ) d53) 

fc=0 i=0 i=0 2^=0 A lM+k 

We note that we have now expressed the MMSE as the sum of the errors in M frequency bands. Let 
us define the error at k th frequency band as 

AN— l AN-l ,2 

el = ^2 Wfc ~ E ^anT" . < <M- 1 (154) 

i=0 i=0 2_W=0 X lM+k 

Example 7.1 Before moving on, we study a special case: Let AN = 2. Then 

e w k = \k + \N +k ~ ^. 2+ (155) 



2 



2X k \N +k 



(156) 



Hence = \~J~)- We note that this is the MMSE error for the following single output multiple 



input system 



[1 1 



4 



(157) 



where s k ~ Af(0,K s k), with K s k = diag(Afc, Ajv +fc ). Hence the random variables associated with the 

frequency components at k, and y + k act as interference for estimating the other one. We observe 
that for estimating x we have ^ such channels in parallel. 



We may bound e\" as 



2\ k \N,, 2\ k \N , k 
1 7^—— < 7T \ , (158) 

Afc + Ajv +fc max(A fc , Ajv +fc ) 



2min(AA;, Ajv +fc ) (159) 



This bound may be interpreted as follows: Through the scalar channel shown in ()157p . we would like to 
learn two random variables Sg and sf. The error of this channel is upper bounded by the error of the 
scheme where we only estimate the one with the largest variance, and don't try to estimate the variable 
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with the small variance. In that scheme, one first makes an error of min(Afc, Ajv +fc ), since the variable 
with the small variance is ignored. We may lose another min(Afc, A jv +fc ), since this variable acts as 
additive noise for estimating the variable with the large variance, and the MMSE error associated with 
such a channel may be upper bounded by the variance of the noise. 

Now we choose the set of indices J with \J\ = N/2 such that kZLJ^t-^ + k^LJ and J has the most 

power over all such sets, i.e. k + arg max ^kn+k G J > where < k < N/2 — 1. Let Pj =y At. 

komN/2} 

Hence 

N/2-1 N/2-l 

E[\\x-E[x\y]\\ 2 ]= z W k< 2 E min(A fc ,A f+fc ) = 2(P-P J ). (160) 

k=0 k=0 

We observe that the error is upper bounded by 2x (the power in the "ignored band"). 

We now return to the general case. Although it is possible to consider any set J that satisfies the 
assumptions stated in (|93p . for notational convenience we choose the set J = {0, . . . , M — 1}. Of course 
in general one would look for the set J that has most of the power in order to have a better bound on 
the error. 

We now consider 



AN-l AN-l ,2 

E Wfc - E ^anT" . 0<k<M-l (161) 

i=0 i=0 2^=0 X lM+k 



We note that this is the MMSE of estimating S k from the output of the following single output 
multiple input system 



z k 



[1 ••• 1 



L s AN-l 



(162) 



where s k ~ Af{0, K s h), with K s k = diag(cr^) = diag(A fc , . . . , X iM +k, • • • , \AN-i)M+k)- We define 

AN— l 

pk = E A »'+ fc ' 0<fc<M-l (163) 

1=0 

We note that X)^ 1 pk = P - 

We now bound e k as in the AN = 2 example 

AN— l AN-l y2 

el" = E " E ^anT" . (164) 

i=0 i=0 <^l=0 A lM+k 

AN-l ,2 

= £(W-%*), (165) 

i=0 

= " ^) + E ^M + k ~ (166) 

i=l 
AN-l 

iM+k (167) 

i=l 

= (P k -\ k ) + P k -\ k (168) 
= 2(P k -X k ) (169) 
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where we've used \ k - ^ = Afc(i ^ fc Afc) < P k - \ k since < jfa < 1 and \ iM +k ~ ^fr^ < >HM+k since 

'p fc ffc > 0. This upper bound may interpreted similar to the Example 17. 11 The error is upper bounded 
by the error of the scheme where one estimates the random variable associated with X k , and ignore the 
others. 

The total error is bounded by 

AI-l M-l 

E[\\x - E[x\y}\\ 2 } = < £ 2(P k - X k ) (170) 

fc=0 k=0 

M-l M-l 

k=0 k=0 

= 2{P-Pj) (172) 



Remark 7.5 We now consider the case where K y may be singular. In this case, it is enough to use Ky 
instead of K~ l , where + denotes the Moore-Penrose pseudo-inverse \21\ Ch.2]. Hence the MMSE may 
be expressed as tr(K x -K xy K+Kl y ). We have K+ = {U M ^yU\ /I ) + = [/" M A+£/^ = U M diag(Aj /jfc + )C/] f; 
where Alt" t = if X v k = and X^ h = t— otherwise. Going through calculations with instead of 
Ky 1 reveals that the error expression remain essentially the same 

AN—1 AN-1 ,2 
E[\\x - E[x\y}\\ 2 } = E ( E A ^+^ ~ E ^AnT* )» ( 173 ) 

feeJo i=0 i=0 1^1=0 *lM+k 

where J = {k : E^o^ 1 <W+fc + 0,0 < k < M - 1} C {0, ... ,M - 1}. We note that ANXy tk = 
t-^AN-1 > _ pk 

1^1=0 XiM+k - P ■ 
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